Iterative selection using orthogonal regression techniques

نویسندگان

  • Bradley C. Turnbull
  • Subhashis Ghosal
  • Hao Helen Zhang
چکیده

High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered — forward selection methods and penalization methods. In the former, variables are introduced in the model one at a time depending on their ability to explain variation and the procedure is terminated at some stage following some stopping rule. For ultra-high dimensional data, [Wang 2011] studied forward regression for variable screening. In penalization techniques such as the LASSO, an optimization procedure is carried out with an added carefully chosen penalty function, so that the solutions have a sparse structure. Recently, the idea of penalized forward selection has been introduced by [Hwang, Zhang and Ghosal, 2009]. The motivation comes from the fact that the penalization techniques like LASSO give rise to closed form expression when used in one dimension, just like the least square estimator. Hence one can repeat such a procedure in a forward selection setting until it converges. The resulting procedure selects sparser models than comparable methods without compromising on predictive power. However, when the regressor is high dimensional, it is typical that many predictors are highly correlated. We show that in such situations, it is possible to improve stability and computation efficiency of the procedure further by introducing an orthogonalization step. At each selection step, variables potentially available to be selected in the model are screened on the basis of their correlation with variables already in the model, thus preventing unnecessary duplication. The new strategy, called the Selection Technique in Orthogonalized Regression Models (STORM), turns out to be extremely successful in reducing the model dimension further and also leads to improved predicting power. We carry out a detailed simulation study to compare the newly proposed method with existing ones and analyze a real dataset. AMS 2010 Subject classification:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Network Planning Using Iterative Improvement Methods and Heuristic Techniques

The problem of minimum-cost expansion of power transmission network is formulated as a genetic algorithm with the cost of new lines and security constraints and Kirchhoff’s Law at each bus bar included. A genetic algorithm (GA) is a search or optimization algorithm based on the mechanics of natural selection and genetics. An applied example is presented. The results from a set of tests carried ...

متن کامل

Elastic net orthogonal forward regression

An efficient two-level model identification method aiming at maximising a model's generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisa...

متن کامل

Identification of nonlinear systems with non-persistent excitation using an iterative forward orthogonal least squares regression algorithm

A new iterative orthogonal least squares forward regression (iOFR) algorithm is proposed to identify nonlinear systems which may not be persistently excited. By slightly revising the classic forward orthogonal regression (OFR) algorithm, the new iterative algorithm provides search solutions on a global solution space. Examples show that the new iterative algorithm is computationally efficient a...

متن کامل

Thresholding-based Iterative Selection Procedures for Model Selection and Shrinkage

This paper discusses a class of thresholding-based iterative selection procedures (TISP) for model selection and shrinkage. People have long before noticed the weakness of the convex l1-constraint (or the softthresholding) in wavelets and have designed many different forms of nonconvex penalties to increase model sparsity and accuracy. But for a nonorthogonal regression matrix, there is great d...

متن کامل

An iterative orthogonal forward regression algorithm

A novel iterative learning algorithm is proposed to improve the classic orthogonal forward regression (OFR) algorithm in an attempt to produce an optimal solution under a purely OFR framework without using any other auxiliary algorithms. The new algorithm searches for the optimal solution on a global solution space while maintaining the advantage of simplicity and computational efficiency. Both...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical Analysis and Data Mining

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2013